Native Language Identification using Stacked Generalization
نویسندگان
چکیده
Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on three datasets from different languages. We also present the first use of statistical significance testing for comparing NLI systems, showing that our results are significantly better than the previous state of the art. We make available a collection of test set predictions to facilitate future statistical tests.
منابع مشابه
Stacked Sentence-Document Classifier Approach for Improving Native Language Identification
In this paper, we describe the approach of the ItaliaNLP Lab team to native language identification and discuss the results we submitted as participants to the essay track of NLI Shared Task 2017. We introduce for the first time a 2-stacked sentencedocument architecture for native language identification that is able to exploit both local sentence information and a wide set of general–purpose f...
متن کاملAn Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization
Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...
متن کاملAn Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization
Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...
متن کاملFewer features perform well at Native Language Identification task
This paper describes our results at the NLI shared task 2017. We participated in essays, speech, and fusion task that uses text, speech, and i-vectors for the task of identifying the native language of the given input. In the essay track, a linear SVM system using word bigrams and character 7-grams performed the best. In the speech track, an LDA classifier based only on i-vectors performed bett...
متن کاملReading ability influences native and non-native voice recognition, even for unimpaired readers.
Research suggests that phonological ability exerts a gradient influence on talker identification, including evidence that adults and children with reading disability show impaired talker recognition for native and non-native languages. The present study examined whether this relationship is also observed among unimpaired readers. Learning rate and generalization of learning in a talker identifi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.06541 شماره
صفحات -
تاریخ انتشار 2017